An approach for estimating haplotype diversity from sequences with unequal lengths

نویسندگان

چکیده

Quantifying genetic diversity and its spatial–temporal patterns is crucial for understanding a species’ evolutionary history population dynamics. The most widely used approaches describing from mitochondrial DNA sequences are nucleotide (π) haplotype (h, Goodall-Copestake et al., 2012; Miraldo 2016). calculation of traditionally depends on intraspecific with equal lengths (hereafter, intraspecies sequence lengths). However, in public databases such as GenBank (https://www.ncbi.nlm.nih.gov/) BOLD (http://www.boldsystems.org/) vary length, which poses significant challenge to quantify the traditional methods. Recent developments calculating have allowed investigation π data unequal (Miraldo there still no method or strategy calculate h based length 2016), another important parameter sculpting property diversity. Previous studies pairwise comparison differences between distinct haplotypes (Goodall-Copestake Nei & Li, 1979; Tajima, 1981). Since that mathematically related diversity, it can be inferred by estimating For instance, previous study using Cytochrome oxidase subunit I (COI) genes 23 animal species revealed quantitative relationship 2012). This suggested might estimated constructing model combines h. Obtaining accurate values prerequisite building incorporating two parameters. since lengths, only approximately estimated, existing methods cannot handle data. To best our knowledge, model-dependent approach not available. Tajima has uncovered segregating sites (Tajima, 1989, 1993), provides new idea estimate Segregating randomly chosen criteria judge whether those different haplotypes. Based this, we developed same parameters sequences. capable without summarizing population-level their relative frequencies. advantage this now analyse variable thus dealing main shortcoming We therefore novel explore theoretical evaluated performance terrestrial vertebrates (amphibians, birds mammals) Homo sapiens lengths. Finally, applied latitudinal b (CYTB) COI vertebrates. Our work may promote further quantification global multiple metrics (e.g. h) regardless given increasing availability investigate would achieve computational (equal data), 951 (97 Amphibians COI, 87 CYTB, 242 Birds 79 244 Mammals 201 CYTB) artificial 20 nucleotides an R script (Fan 2021a) Besides, also calculated ‘DNA Polymorphism’ function DNAsp V5 (Librado Rozas, 2009). Before calculations, each were aligned MUSCLE (Edgar, 2004) default setting. assessed accuracy comparing 2009) Mann–Whitney U test (wilcox.test R). mean value Ri determine standard deviation (SD) measure stability. evaluate quantifying data, stability compared tests Additionally, reported average three vertebrate groups (Birds, Amphibians), popular represent specific region's (areas being grid cells bands) recent (Gratton, Marta, Bocksberger, Winter, Keil, 2017; Millette 2019; gene fragments (CYTB, COX3, D-loop HVR-1) complete genome H. run ‘Random Length analysis’ broader range datasets. beta regression linear (Ferrari Cribari-Neto, 2004; 2016) latitude. Considering research demonstrated poleward decreasing trend π, simplicity, (formula: ~ latitude + latitude2; weights: D) account along applies quadratic terms introduced better fit because peak always at equator opposite northern southern hemispheres. As similarity copies tends decay geographical distance 2017), pairs conspecific (D) ‘distm()’ ‘distVincentyEllipsoid’ provided geosphere packages (Hijmans 2019) weight models eliminate influence results. ran independent amphibians, mammals birds, CYTB examples testing. obtained (Figure S1). With species, across Birds, (Birds W = 20,402, p 1; 2,245, 22,898, 13,612, 2,888, 3,362.5, 1), indicating robust performing analyses S1A). results random analysis showed estimation was significantly higher than 2a) all except 33,649, < 0.05; 3,377, 0.372; 36,598, 0.001; 29,711, 5,674, 4,871, 0.01; tests). indicate more suitable estimates 2b) (except but 34,824, 3,572, 0.116; 34,356, 26,307, 5,215, 0.19; 4,317, 0.289; tests), suggesting variation influences well π. when taxon, smaller 2). Consistent result found both (Table These suggest reliable broader-resourced Relying method, gradient diversity/nucleotide had negative term (Figures 3 4, Table 1). similar group. observed decrease 3b 4b), 3d 4d) amphibians 4f). followed hemispheres 3a,c,e 4a,c,e). In particular, hemisphere weaker 3a 4a), trends 3c,e 4c,e). Genetic consisting h, fundamental biodiversity (Robert, 1994). Over last few years, measuring become attractive topic conservation rapid increase large current V5) available datasets, underlining necessity method. early (1983 1989), discussed described how heterozygosity (segregating sites). Thanks Tajima's contribution inspiration, propose sites. It should pointed out basis (the sequences) Tajima's, presented here does require number result, deal varied thereby augment employ provide tool spatial consistent other performs 2), deriving missing associated high S2). Although variability ideal metric large, kij time-consuming limitation Hence, future time-saving big paper uses Moreover, taxa evidence context-dependent previously been strictly positive. mathematical (), k constant (defined Equation 12), degree difference samples sample size (kij), causing correlation nucleotide–haplotype one reason why difficult observe directional (negative positive; Bird 2007; Song 2013; Wang Zhang validated A showing low signature recently diverged populations (Garg Mishra, 2018; 2014). could potential divergence; small suggests divergence populations. An example application testing vertebrates, tropics, towards poles taxon 4). updated pattern assessment Furthermore, overlap percentage database (94.37%) larger simulation (74.90%), derived real expected least analyses, multi-metrics fully Here, briefly discuss explanations declining reflect strength natural selection (Camus 2017) migratory ability 2017). example, configuration land ocean hemispheres, sea surface temperatures stable owing area south north (Fordham Hong 2019). explanation greater hemisphere, amphibians. shown bird turnover response climate change (Virkkala Lehikoinen, assemblage sensitive change, relatively birds. summary, present use case confirmed reliability. Despite fact approximation, lays solid foundation precise accommodate accelerated lead fruitful era thank Xiaolu Jiao Xin Yu assistance analysis, Weiwei Zhai, Liang Ma Hechuan Yang discussion Huijie Qiao his generous help during revision. funded Strategic Priority Research Program Chinese Academy Sciences (XDA19050202 F.L.), Second Tibetan Plateau Scientific Expedition (STEP) (2019QZKK0304 F.L. G.S.), National Science Foundation China (32070434 31572291 G.S.; 31630069 Technology Basic Resources Survey (2019FY100204 P.F.) Scholarship Council, Grant/Award Number: [2017]7011 P.F. F.L., J.F. conceived designed methodology. collected P.F., X.L. Y.D. analysed wrote original draft. J.F., G.S., X.L., Y.C., Y.Q. reviewed edited text. All authors contributed constructive comments approved submitted version manuscript. peer review article https://publons.com/publon/10.1111/2041-210X.13643. dataset codes repository: https://github.com/PingFan6/estimating-haplotype-diversity. files archived Zendo (https://doi.org/10.5281/zenodo.4722108). summary online datasets deposited Dryad Repository (https://doi.org/10.5061/dryad.ghx3ffbnt). Please note: publisher responsible content functionality any supporting information supplied authors. Any queries (other content) directed corresponding author article.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Heterodyne interferometer with unequal path lengths

Laser interferometry is an extensively used diagnostic for plasma experiments. Existing plasma interferometers are designed on the presumption that the scene and reference beam path lengths have to be equal, a requirement that is costly in both the number of optical components and the alignment complexity. It is shown here that having equal path lengths is not necessary, instead, what is requir...

متن کامل

An Empirical Approach for Estimating Stress-Coupling Lengths for Marine-Terminating Glaciers

Climate Change Institute, University of Maine, Orono, ME, USA, 2 School of Earth and Climate Sciences, University of Maine, Orono, ME, USA, 3 Alaska Science Center, US Geological Survey, Anchorage, AK, USA, Department of Geological Sciences, University of Idaho, Moscow, ID, USA, Department of Earth System Science, University of California Irvine, Irvine, CA, USA, 6 Institute for Geophysics, Uni...

متن کامل

Inferring Demographic History from a Spectrum of Shared Haplotype Lengths

There has been much recent excitement about the use of genetics to elucidate ancestral history and demography. Whole genome data from humans and other species are revealing complex stories of divergence and admixture that were left undiscovered by previous smaller data sets. A central challenge is to estimate the timing of past admixture and divergence events, for example the time at which Nean...

متن کامل

A new method for estimating the demographic history from DNA sequences: an importance sampling approach

The effective population size over time (demographic history) can be retraced from a sample of contemporary DNA sequences. In this paper, we propose a novel methodology based on importance sampling (IS) for exploring such demographic histories. Our starting point is the generalized skyline plot with the main difference being that our procedure, skywis plot, uses a large number of genealogies. T...

متن کامل

An MDL Method for Finding Haplotype Blocks and for Estimating the Strength of Haplotype Block Boundaries

We describe a new method for finding haplotype blocks based on the use of the minimum description length principle. We give a rigorous definition of the quality of a segmentation of a genomic region into blocks, and describe a dynamic programming algorithm for finding the optimal segmentation with respect to this measure. We also describe a method for finding the probability of a block boundary...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Methods in Ecology and Evolution

سال: 2021

ISSN: ['2041-210X']

DOI: https://doi.org/10.1111/2041-210x.13643